home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Shareware Grab Bag
/
Shareware Grab Bag.iso
/
013
/
narc12.arc
/
NARC.DOC
< prev
next >
Wrap
Text File
|
1987-08-05
|
65KB
|
1,668 lines
NARC - A STAND-ALONE DE-ARCHIVE UTILITY
(no other files required)
Documentation for NARC.EXE
Written by Gary Conway
Infinity Design Concepts
Louisville, Kentucky
Copyright (c) 1987
Version 1.2
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
NARC.EXE is placed in the public domain under the user
supported shareware concept. NARC.EXE is and will remain the
property of Gary Conway. This program may not be used in any
connection with commercial ventures, nor as a sales aid,
without the expressed written consent of the author. All
rights are reserved.
Infinity Design Concepts
1052 Parkway Drive
Louisville, Kentucky 40217
Member IEEE
KIPCUG
PCCL
KKUG
NSPE
All new releases of NARC.EXE and all other IDC utilities
can be located -FIRST- on ;
The SoftStone RCPM FOG #24 (supporting CP/M and MSDOS)
(502)241-4109
30 MEG
300/1200 baud
24 hrs.
Louisville, Kentucky
Curt Edwards - SYSOP
Sponsored by: Kentucky Kaypro Users Group
Accounting Computer Systems
First Osborne Group
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
REGISTRATION
If you find yourself using NARC, please take the time
to do the right thing and that is register your copy. You
have been provided the opportunity to freely test the program
before even thinking about registering. This is only fair,
so, in fairness, you should reciprocate and register your
copy, if you continue using the program. There are several
ways to register NARC. Registered users will be notified of
updates to the program.
Disk only with current version .................. $20.00
(includes manual on disk)
Printed manual .................................. $15.00
(bound and printed manual)
Printed manual and disk ......................... $35.00
Site License ................................... $50.00
(required for business use)
You will find the registration form in the ARChive with
this document under the name REGISTER.FRM. Please use this
form for registration.
THIS IS NOT A FREE PROGRAM
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
═════════════════════════════════════════
70% F A S T E R E X T R A C T I O N
═════════════════════════════════════════
There has been a large number of requests for an increase in
extraction speed. NARC version 1.2 demonstrates a 70%
increase over previous releases.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
.............................................................................
........................ ............................
........................ TABLE OF CONTENTS ............................
.............................................................................
Page
WHAT IS IT ANYWAY.................................. 1
ACKNOWLEDGEMENTS................................... 1
COMPATIBILITY...................................... 2
Author's Ramblings............................. 2
ARChive Storage Methods Supported By NARC...... 2
Packing
Squeezing
Crunching
Squashing
OVERVIEW........................................... 3
COMMANDS........................................... 4
Extract Command................................ 4
View Command................................... 5
Print Command.................................. 5
ARC-wind Command............................... 6
DRV-wind Command............................... 6
SUB-wind Command............................... 6
Quit Command................................... 6
ALTERNATE COMMANDS................................. 6
Function keys
Find Command
Kill File Command
Page UP, DOWN, HOME,END
OPERATING HINTS AND SHORTCUTS...................... 8
ERROR MESSAGES..................................... 9
ARCHIVE FILE FORMATS AND GENERAL INFORMATION....... 10
Packing........................................ 10
HUFFMAN CODING (SQUEEZING)......................... 11
CRUNCHING (LZW COMPRESSION)........................ 15
DETAILS OF STORAGE VERSIONS........................ 17
ARChive file Header Structure.................. 19
HASHING............................................ 20
CRC - calculations............................. 20
ARC RELEASE DATES AND VERSIONS..................... 21
NARC REVISION HISTORY.............................. 22
........................................................................
Narc (c) 1987 Infinity Design Concepts all rights reserved
........................................................................
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
═══════════════════
WHAT IS IT ANYWAY ?
═══════════════════
NARC is a menu driven de-ARChive facility, written entirely
in assembler. NARC allows you to easily move from ARC file to
ARC file, with the option of viewing, printing, extracting or
deleting the subfiles from the ARChive. The program may be
operated from the mouse or the keyboard. Menus are of the
musical popup variety to add a little "TechNoFlash" to the
proceedings. NARC is the culmination of about six months of
frustrating effort and 8000 + lines of 8088 source code.
Why....
Because I use a lot of ARC files and ARC.EXE and the
clones are reminiscent of the early Ward Christensen
CP/M days in user interface etiquette, I wanted something a
little more flexible and friendly to use. I would like to
pause here for a second and give a little credit to Mr.
Christensen ( the Don Garlits of CP/M ) for the fine FREE
utilities he has given to ALL of us over the years. The next
time you do a modem transfer, you can thank him for the
original XMODEM from which all others have transpired.
Why NARC...
It seemed like a good idea. Short for uN-ARC. The idea was
originally Bob Freed's.
Acknowledgments..
I would like to thank Bob Freed for his allowing me to
examine his Z80 code before writing NARC. Bob wrote UNARC for
the CP/M world and is ( as of this writing 4/28/87) working
on NOAH the ARCing program for CP/M. I would also like to
thank System Enhancement Associates for releasing their
source code in "C". Without both of the above, NARC would
have been a much larger chore than it was. Note also that the
crunching algorithm used in ARC.EXE was taken from COMPRESS,
used in UNIX. A special thanks to Curt Edwards, Jerry Taylor,
Frank Roemer, Paul Bowling and Ken Romines for their "eagle-
eyes" in locating errors.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 1
═════════════
COMPATIBILITY
═════════════
NARC is compatible with all known "skrunching" algorithms,
that is up to and including Squashing. NARC is compatible
with ARC.EXE version 5.21 and PKxARC. NARC supports Squashing
which is nothing more than a variation of the crunching
algorithm, yet it is the easiest (and most logical) of the
crunching methods to code. I have heard a lot of criticism of
squashing, but those folks need to get up with the times,
squashing is (and should be) here to stay. NARC also
recognizes the .ARK extension soon to be prevalent in the
CP/M world via Bob Freed's CP/M archive facility, NOAH.
Author's Ramblings
System Enhancement Associates, I am told is dropping the ball
as far as ARC.EXE goes. I think that Thom Henderson deserves
a great round of applause for his contribution and help with
a formidable problem, namely, storage space (or lack
thereof).
The oldest version of ARC.EXE that I can find is version
3.10, released 5-1-85. This version supports storage methods
up to and including squeezing (no crunching). If anyone has
an older version I would be interested in seeing it. Here is
a list of the versions that I do have and would be interested
in getting any other versions floating around.
3.10 4.10 4.50 4.52 5.00 5.10 5.12 5.20
═════════════════════════════════════════
ARCHIVE STORAGE METHODS SUPPORTED BY NARC
═════════════════════════════════════════
Packing - all versions
Squeezing - Huffman Coding
Crunching - all versions (LZW encoding)
Squashing - one version
Note: LZW stands for Lempel-Ziv-Welch
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 2
OverView...
When NARC is first invoked, it saves the current drive/path
for use again on exit, so you always end up where you
started. Some folks like that and some don't. I DO. NARC
first searches the default path for ARC files and if any are
found they are displayed in a window on the left side of the
screen. The arrow keys (or the mouse) may be used to move the
cursor bar up and down the window, there are three ways to
select the highlighted ARC file ;
(1) Hit the ENTER key
(2) Press the left mouse button
(3) Hit the F1 key.
NOTE: Function keys F1 and F2 mimic the mouse buttons as ;
F1 = left mouse button
F2 = right mouse button
F3 = both mouse buttons
NARC will determine whether a monochrome or color monitor is
being used and act accordingly. EGA is not directly supported
at this point, because I don't have one and can't test the
routines.
After selecting the sub-file of interest, NARC displays all
of the ARC sub-files and their statistics on the screen. You
are also given a menu bar at the bottom of the screen. You
may use the arrow keys or the mouse to move the cursor bar to
the desired selection and then select with the F1 key or the
ENTER key or the left mouse button. As the cursor bar is
moved, you are also given a brief description of the
highlighted command. The commands will now be discussed
individually.
Note: You may also select any option from the command bar by
entering the first letter of the command.
The ESCape key will exit the program from any of the windows
or the command bar.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 3
════════
COMMANDS
════════
═══════════════
Extract Command
═══════════════
Selecting this option will cause another prompt to be
displayed, asking whether you wish to extract the highlighted
file or tagged files. (Files are tagged with the space bar).
"Point and shoot" here again as before. The ESC key or the
right mouse button will abort the operation. If the disk
becomes full, you will be informed and have the option of
aborting or continuing.
Highlighted File
When EXTRACT is selected, you will be asked for a
drive/path to extract the file to. If you simply hit ENTER or
the F1 key or the left mouse button, the file will be
extracted to the default drive/path. You may also enter any
valid DOS drive/path. The ESC key or the F2 key or the right
mouse button will abort the operation.
Tagged Files
The Space bar (or F3 key) is used to TAG the current file.
When a file is tagged, an asterisk will be displayed on the
line with the current file in column 80. As a side note, the
asterisk was chosen as a reminder of the CP/M days of NSWEEP
and B29. The space bar will also unTAG a file, thus it is a
"toggle". When the space bar is pressed, an asterisk will
appear as described above and the cursor bar will move to
the next file. When the "TAGGED" option is selected from the
command line, all files that have been tagged, will be
extracted to the SAME drive/path.
After the file is extracted, it's date and time are set to
those contained in the ARC file. The file is also checked for
size and CRC, if both of these do not match exactly what was
contained in the ARC file header, then an error has occurred
and the user is notified The files will also remain tagged
after the extraction.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 4
════════════
View Command
════════════
This option will display the currently highlighted file on
the screen. The space bar is used to pause and the ESC key is
used to abort. No mouse support here, since it slows things
down too much.
═════════════
Print Command
═════════════
The print option will print the currently highlighted file.
After selecting the print option, you will be asked which
character pitch you want to print in. Enter the number that
you wish (or 0 for the default pitch) and the printer will be
set to that pitch.
NOTE: The printer strings that come installed with NARC are compatible
with EPSON printer strings. If you wish to install NARC with
your own strings, see NARCCFG.DOC for complete instructions.
After you have selected the printer pitch, you will be
shown three more options. Use the arrow keys to move left and
right. The space bar (or right mouse) is used to toggle the
options ON or OFF. When you have finished and are ready to
print, hit ENTER (or left mouse button). The options are;
Format - YES - This option causes NARC to format the
output with page breaks and page numbers.
NO - NARC does not format the file.
The following two options work independently of the Format option.
Strip High - YES - NARC will strip the high bit off each
character before it is sent to the
printer. Some word processors set this
high bit on some characters as a means of
text formatting. These characters will
print as garbage usually.
NO - NARC will not strip the high bit.
Strip Control - YES - NARC will strip all control characters
from the file before it is printed. This
is useful on files that have embedded
formatting characters, and you wish to
have NARC provide the formatting.
NO - NARC will not strip the control chars.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 5
════════════════
ARC-wind Command Note: The right mouse button will also pop
════════════════ this window up.
This option will display the window containing all of the
ARC files in the current sub-directory. Move the cursor bar
up and down with the mouse or arrow keys and select with the
F1 key or ENTER key or left mouse button.
════════════════
DRV-wind Command
════════════════
This option will pop up a window that contains all of the
logical drives that DOS reports to NARC. Select as before and
the ARC-window will be popped up so that you can then choose
an ARC file to examine.
════════════════
SUB-wind Command
════════════════
This option will pop up a window that contains all of the
sub-directories in the current directory. Point and shoot as
before. After making your selection, the window is
automatically popped up again showing the sub-directories in
the new current directory. When you get to the directory that
you want, hit the F2 key or the right mouse button and the
window will be put away and the ARC-window will be popped up
so that you can then select an ARC file.
════════════
Quit Command
════════════
I hope I don't need to describe this one.
NOTE: The ESCape key will also exit NARC.
══════════════════
ALTERNATE COMMANDS
══════════════════
The extra commands can be located on the help screen, which
is invoked by the F10 key.
F1 - key
Will select the highlighted option.
F2 - key
This keys function varies with the window that is on the
screen at any one time. When an ARC file is opened and the
subfile list is onscreen, this key will pop up the ARC file
window again. Any other use of this key is given at the
appropriate time on the screen.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 6
F3 - key
This key will tag a subfile and move the cursor bar on to
the next subfile. This key also has other functions, and they
are also shown on the screen when necessary.
F4 - key
The F4 key will print an image of the screen less the
advertisement and command lines.
F5 - key
Invokes the NARC-DOS command processor. You may then enter
any valid DOS command. When finished, simply hit ENTER
by itself and you will be returned to NARC. You may also
enter "COMMAND" which will invoke a second copy of
COMMAND.COM, if the file COMMAND.COM is in your search path.
To return to NARC, you would then type "EXIT".
F6 - key
This key will tag all of the subfiles in the ARChive for
subsequent extraction.
F7 - key
This key will invert all of the tags on the subfiles, that
is all files that were tagged will become untagged and vice-
versa.
F10 - key
This key will display the NARC help screen with all of the
alternate commands listed and a brief description of their
functions.
(F)ind command.
Will prompt for a wildcard filename to find in the sub-file
list. Any number of characters may be used, for example, you
may enter a single character and NARC will find the first
file whose name begins with that character. Alternatively,
you may enter a complete wildcard specification and NARC will
attempt to find a match.
(K)ill file command.
Will remove the currently highlighted sub-file from the
ARChive. No additional disk space is required for temporary
files.
PgUP,PgDN,Home and End
These keys do what you might expect. They only work when
you have an ARC file opened and the subfiles are displayed on
the screen. These functions are nice for large ARC files.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 7
Operating Hints and Philosophy and Shortcuts
When NARC is first invoked, it pops up the window showing
all of the ARC files in the current directory. The first
thing that you must always do is to select an ARC file. The
reason for this is that when I want to look inside an ARC
file, I will move to that drive/directory and then call
NARC. This saves me from having to select where I want to
look and what drive and all that mess. This way, when the
program comes up, I can go right to work. If you run NARC
and then decide to change drives or directories,you MUST
first select an ARC file from the first window.
When there are no windows popped up, the right mouse
button (or F2 key) will pop up the ARC file window,
regardless of where the command line cursor bar is. This is
handy when you have a lot of ARC's that you want to thumb
through, with just the 2 mouse buttons or F1 and F2 keys,
you can look through a whole directory of ARC files in
nothing flat ! Also along these lines, when the ARC window
is on the screen, the right mouse button or the ESCape key
will exit the program.
The control system is a little tricky at first,
but I think you will like it once you get used to it.
NARC buffers 64k of the input file at one time, thus
speeding up file operations. The output buffer is 32k and
should be larger in the next version. NARC requires about
170K of RAM to operate.
(This prime advertising space for rent)
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 8
══════════════════════
Error Messages.
══════════════════════
Memory Allocation Error.
- NARC allocates memory when it is invoked, this says that DOS told
NARC that there was not enough memory left to run the program
ERROR: Extraction Failed due to CRC error, Hit ENTER
- After a file is extracted, the CRC contained in the ARC header is
compared to the CRC that NARC calculates, this message says that
the two were different. This is the CRC-16 polynomial.
ERROR: Extraction Failed due to FileSize error
- Same as above, except with filesize
ERROR: Disk File Inconsistency. Hit ENTER
- This will usually mean that the user has swapped disks just after
telling NARC to View,Print or Extract and NARC does not recognize
the file.
ERROR: Incompatible Crunch Format
- Says that either the stowage code for this file is not supported by
NARC -OR- there is an error in the ARC header
ERROR: Extraction Failed due to Lack of Disk Space - (A)bort (C)ontinue
- Abort will stop tagged extraction, continue will try to fit next
file.
Squeezed File Has a Diseased Decode Tree.
- When unsqueezing a file, NARC has found a bad entry in the decode
table.
ERROR: No directory space on destination!
- Self explanatory
Bad Path Name, Hit ENTER
- The destination path that the user entered for extraction is not
a valid DOS pathname, re-enter.
Requires DOS version 2.0 or above.
- NARC requires DOS 2.0 or above to operate.
Invalid archive file format
- NARC could not find any ARC headers, this is probably not an ARC file
Warning: Bad archive file header, bytes skipped = xxxxx
- If an entry has a bad header, NARC will examine the next 64k bytes
looking for a good header. This is to maintain compatibility with
ARC v.5.20 which allows self-unpacking ARC files.
Unexpected end of ARChive file
- Says that NARC couldn't find the last ARC header
No matching file(s) in ARChive
- ARC file is empty
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 9
════════════════════════════════════════════
ARCHIVE FILE FORMATS AND GENERAL INFORMATION
════════════════════════════════════════════
For Those With a Little More Curiosity...
The following are the currently supported stowage methods.
1 unpacked (obsolete)
2 unpacked
3 packed
4 squeezed (after packing)
5 crunched (obsolete)
6 crunched (after packing) (obsolete)
7 crunched (after packing, using faster hash algorithm)
8 crunched (after packing, using dynamic LZW variations)
9 Squashed c/o Phil Katz (no packing) (var. on crunching)
NOTE: LZW is Lempel-Ziv-Welch crunching algorithm
A little about the stowage methods.
Packing -
This is the simplest of the storage methods. Suppose that
you have a line of text and at the end of the line, you
have 40 spaces. These 40 spaces are compressed into 3 bytes
in the ARC file. The first byte is the actual character to
be expanded (in our case a space). The second byte is a
special "flag" byte that indicates that we need to
expand these bytes. The third byte is the count byte (in our
case it would be 40). So you can see that any time the ARC'er
finds repeated bytes like this, it can compress them into 3
bytes. The anomalous case to watch out for here is when the
count byte is the same character as the "flag" byte, this
proved to be a difficult roach to kill !
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 10
══════════════════════════
HUFFMAN CODING (SQUEEZING)
══════════════════════════
It does, at first, seem that making a file smaller would be
an impossible task. I will make an attempt here to shed a
little light on this subject since that is a question that I
hear pretty frequently and it is not a two minute discussion
question. It does require some thought.
To compress a file with the Huffman algorithm, commonly
called squeezing, the first thing that must be done is
to read the file completely and count the occurrences of
each character. That is you count the "A" 's and the "B" 's
and so forth. There are 256 characters in the extended
ASCII character set, of which approximately 90 are
"printable", that is you can see them on the screen. The
IBM set has more "printables", but that is of no
consequence, since the squeezer deals only with the numbers
and doesn't care whether or not the file is an ASCII text
file or an EXE file. Once the squeezer has counted the
occurrences of each character, thus the frequency of
occurrence, it scans the table for the characters that
appear the least number of times and forms an imaginary
link between them, called a node. Somewhere else in the
tree, we will later develop a pointer that points to this
node. When you start putting all of these things
together, you will form a binary tree in memory. Confused
enough ? Let us try an example.
We have a file that is 100 bytes long and has 6
different characters in it. We have counted the
occurrence of each of the characters and found the following.
quantity character
5 - D
10 - A
10 - F
20 - B
25 - E
30 - C
The spelling in the file wasn't very good, but we don't
care. Now we take these numbers and will call them the
frequency of each character. We then arrange the table as
below. This is an arbitrary arrangement, but it is useful
here so as to make our tree readable on the screen. The
arrangement makes no difference.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 11
Frequency 20 10 5 10 30 25
Character B A D F C E
We then examine the table to find the two characters with
the smallest frequency of occurrence. In our case, it is
obvious that one of them is 5,but which 10 do we choose. As
it turns out, it doesn't matter which one you choose, we will
arbitrarily choose the F. We draw lines from the D and the F
to form our node (the box below).
Frequency 30 10 5 10 20 25
Character C A D F B E
\ /
\ /
╔══╗
║15║ = 5 + 10
╚══╝
The number in the box is the sum of the frequencies of
the D and F characters. Now we again look for the lowest
two frequencies, except, this time we do not consider the D
and F characters individually, we instead consider the node.
The lowest two now are the A and the node, that is 10 and
15. We again do some artwork.
Frequency 30 10 5 10 20 25
Character C A D F B E
\ \ /
\ \ /
\ ╔══╗
\ ║15║ = 5 + 10
\ ╚══╝
\ /
\ /
╔══╗
║25║ = 10 + 15
╚══╝
We look at the table again for the next two lowest
frequencies and now find B and E . We continue in this
fashion until the entire "tree" is built, that is until it
all condenses to one node. The leaves are the actual
characters at the top of the tree and the nodes represent
branch joints with the root at the bottom.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 12
Frequency 30 10 5 10 20 25
Character C A D F B E
\ \ \ / \ /
\ \ \ / \ /
\ \ ╔══╗ ╔══╗
\ \ ║15║ ║45║
\ \ ╚══╝ ╚══╝
\ \ / /
\ \ / /
\ ╔══╗ /
\ ║25║ /
\ ╚══╝ /
\ / /
\ / /
╔══╗ /
║55║ /
╚══╝ /
\ /
\ /
╔════╗
║ROOT║
╚════╝
Now that our tree is made up, we can encode the file. We
start at the root (always). To encode the first character
(leaf) of the tree (the letter C), we trace up the tree until
we hit the letter C at the top. Along our journey, if we
make a left turn, we record a 0 bit, and a 1 for a right
turn. So for the C, we would go left to 55 (and record a 0)
and then left again to the letter C (and record another 0),so
the Huffman code for our letter C is 00. For A we go left to
55, right to 25 and left to A and it becomes 010. By doing
all of the letters this way, we find the following.
C = 00 ( 2 bits )
A = 010 ( 3 bits )
D = 0110 ( 4 bits )
F = 0111 ( 4 bits )
B = 10 ( 2 bits )
E = 11 ( 2 bits )
Mind that the zeroes and ones above are bits and not
bytes. Each character was represented in the original
file by 8 bits (one byte) and since we have reduced the
number of bits needed to represent each character, we
therefore reduce the size of the file. The savings add up as
follows,
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 13
character frequency original bits squeezed bits savings
C 30 30 x 8 = 240 30 x 2 = 60 180
A 10 10 x 8 = 80 10 x 3 = 30 50
D 5 5 x 8 = 40 5 x 4 = 20 20
F 10 10 x 8 = 80 10 x 4 = 40 40
B 20 20 x 8 = 160 20 x 2 = 40 120
E 25 25 x 8 = 200 25 x 2 = 50 150
══════════ ══════ ═════ ═════
Totals 100 800 240 560
│ │
original file size ──────┘ │
squeezed file size ───────────────────────┘
240 is 30% of 800, so we have compressed this file by
70%. Golly Wally, that seems pretty good. The rub lies in the
fact that in order to reconstruct the original file, we
must have access to the decode tree and since each tree
will be different for each file, we must therefore save the
tree with the file.It turns out that the tree can have only
256 nodes in a bytewise compression technique and each node
will hold 4 bytes as pointers,a full table will be about 1k
long. The table in our example has 5 nodes plus the 6 leaf
nodes (where our characters are), totaling 11. 4 times 11
is 44 and if we add a few bytes for storing the node count
and some other statistics, our table is about 50 bytes long.
If we look at the 240 in the above table this gives the total
number of bits that it will take to encode the file, divide
240 by 8 to get the number of bytes (30) and add it to our
50, we get a compressed file size of 80 bytes. Since our
original file was 100 bytes, we have achieved a 20% reduction
in file size. Not bad. What we have really accomplished is a
translation of character sets, with our new set requiring
less space than the original ASCII set.
How far can we go ?
If we look at the maximums that we can obtain for the
different bit combinations in a optimally skewed tree, that
is a tree that is not exactly symmetrical, we find that we
can have only 4 - 2 bit codes, 8 - 3 bit codes, 16 - 4 bit
codes, 32 - 5 bit codes, 64 - 6 bit codes, 128 - 7 bit codes,
the remaining 4 will be 8 bit codes.
2 - 1 bit codes
4 - 2 bit codes
8 - 3 bit codes
16 - 4 bit codes
32 - 5 bit codes
64 - 6 bit codes
128 - 7 bit codes
--------
254
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 14
And since we have a total of 256 different bytes to
encode, the remaining 2 characters must have 8 bit codes. If
we add the number of bits that this represents,we find a
total of 1554 bits or 195 bytes. So at maximum, we have
compressed the 256 bytes to 195 or 33%, thus the idealistic
maximum that can be achieved with the Huffman algorithm is
33% when using a byte level implementation.
One final note; The Huffman scheme requires the input file
to be read twice, once to count characters and frequencies
and then again to do the actual encoding. The major
differences in Huffman coding and crunching lie in the fact
that crunching is a one pass operation and does not require
the table to be stored with the file. Both, however, are
extremely vulnerable to errors, for example, imagine what
would happen if you skipped one bit when squeezing the file,
all of the remaining characters in the file would become the
proverbial garbage, since we are looking at the file on a bit
level.
NARC uses the method described in K. & R. pp. 130 for
setting up the binary tree with several modifications. The
simple binary tree is acceptable for this, since the tree
never grows and therefore will never become unbalanced.
If you followed that, now go back and read the second
paragraph again, maybe it will make sense this time.
══════════════════════
CRUNCHING
══════════════════════
Crunching began with an article by J. Ziv and A. Lempel
in IEEE Trans. Information Theory, May 1977, where the
method was originally described. Terry A. Welch wrote a
definitive application article in IEEE Computer, June 1984
which described in detail how to apply the algorithm and
some common problems encountered. Thus the name LZW
compression.
Crunching takes the Huffman coding method a step
further as it does not include a table with the crunched
file. The crunching algorithm also "learns" as it proceeds
through the file. If it finds repeated strings in the file,
they will be encoded into a table. This table is s et up (in
NARC's implementation) as a 4096 by 3 table. Each entry
is formatted as <PREF>,<SUFFIX>, where PREF is a 2 byte
pointer to another entry. SUFFIX is the byte for this
entry. Representing the PREF's as pointers rather than values
speeds up most operations in NARC. This idea came from Bob
Freed and is very trick.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 15
One obvious benefit of crunched files is the fact that
there is no need to include the encoding table in the
compressed file as was the case with squeezing. Another great
benefit is the fact that crunching is a one pass operation
as opposed to the two pass situation in squeezed files.
Crunching begins by creating an "atomic" table, that is
a table in RAM that contains 256 entries, one for each
character in the extended ASCII set. The values range
sequentially from 0 to 255. The table entries are arranged as
follows.
Prefix Pointer (2 bytes) and Suffix byte (1 byte)
In this initial table setup, the Suffix bytes are the 0
through 255 mentioned above. These are the "atomic"
characters, in that they must be in the table before
crunching or uncrunching can begin, since all files contain
some portion of this character set. We do not know which
characters will be included in any given file and which ones
will be excluded,so we must include them all in our initial
table. Once this table is set up, we can begin crunching.
The Prefix pointer will contain a value that is a pointer
to another string. When the table is initially set up, there
are no other strings, so this Prefix pointer is set to a
special "Null" string, that is it points nowhere.We must be
careful when crunching the file, to look for this special
pointer and act accordingly when we encounter it.
This Prefix and Suffix business is used to "build"
long strings. If we read the input file and found that the
first character was the letter "I", we would look this
letter up in the string table by hashing (computing an
address). If we found the letter in the table (which we
certainly will on the first character), then we output it's
"hashed" address as a code to the output file (the
crunched file). Suppose then, that the next character
input from the file was the letter "D", the cruncher would
then look at the I and the D together to see if they exist
as a string in the table. Well of course, since this is the
second character of the file, we know that it doesn't, so
the cruncher forms a new entry in the string table. This
entry has as its' Prefix pointer, a value that "points" to
the letter "I" entry in the table, that we made a minute
ago. The suffix byte in this case would be the letter
"D". Now another code is output to the crunched file,
representing the letter "D". Well this is great, we have
now turned 2 bytes from the input file (16 bits) into 3 bytes
in the output file (24 bits). You are correct, crunching
begins by "not crunching" , but it is a crazy world ! The
real value becomes apparent when we run into this same
sequence "ID" in the input file again. Now we will find an
entry for it in the string table and we can output a single
12 bit code that stands for "ID", thus saving 4 bits. The
cruncher continues "learning" strings like this until the
file is crunched. It should be noted that the string table
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 16
is dynamically changing as the input file is processed.
The early versions of crunching implemented, stopped
"learning" once the string table was full. This gave a very
poor compression ratio in some files. Versions 8 and 9 have
an additional feature called adaptive reset, where the string
table is cleared and crunching begins all over again ! This
capability really helps the larger files more than smaller
files.
Details of Storage Versions
NARC supports all of the current "skrunching" algorithms. A
brief description of each follows.
Version 1
- "STORED" File is simply stored (obsolete now, 25 byte
header)
NOTE: Files stored with this version are rare.
Version 2
- "STORED" Current version of simple storage. This
version was implemented with the implementation of file
compression. The main difference in version 1 and 2 is
the ARC header (see header section below), version 1
has a header length 4 bytes smaller than any of the rest
of the storage methods since in version 1 there was no
need to store the original file length separately from
the stored file length since they were the same.
Version 3
- "PACKED" Repeated bytes are packed into a three byte
string (see Packing above)
Version 4
- "SQUEEZED" after packing. The file is first packed as
described above and then squeezed
Version 5
- "CRUNCHED" This is the first implementation of LZW
released. Uses fixed length codes and a complex hashing
function. (obsolete now) (See hashing below)
NOTE: Files compressed with this version are hard to find.
Version was released only one month when next version
came out.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 17
Version 6
- "CRUNCHED" after packing. The file is first packed and
then crunched. Uses fixed length codes and the same
hashing function as version 5.
Version 7
- "CRUNCHED" after packing. Same as version 6 except a
faster hashing function is used.
NOTE: Thom Henderson (author of ARC) has this to say about
version 7. "This approach was abandoned because dynamic
Lempel-Ziv works as well, and packs smaller also. However
inadvertent release of a developmental copy forces us to
leave this in."
Version 8
- "CRUNCHED" after packing. Uses variable length codes
in the crunched file (9 to 12 bits) and has a faster
hash function (actually not hashing at all, but for the
sake of consistency, we will call it that). This version
also resets the string table when it becomes full which
benefits the compression ratio of larger files. This
resetting is commonly called an adaptive reset.
Version 9
- "SQUASHING" (variation on crunching scheme). This
version uses the same hashing function as version 8 but
varies the crunching codes from 9 to 13 bits. There is
also no packing, which affords the string table the
opportunity to "learn" longer codes and thus improve the
compression ratio (benefits "real large" files).
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 18
ARC file header structure is same for both DOS and CP/M
Byte number Value(s) Meaning
══════════════════════════════════════════════════════════════════════════
1 1Ah Header Flag
2 0-9 Compression Version
3-15 --- ASCIIZ compressed filename
16-19 --- Compressed file size in bytes
Low Word, High Word with each word
in LoHi format
20-21 bits DOS date format
15-9 Year
8-5 Month
4-0 Day (all zeroes means no date)
22-23 bits DOS time format
15-11 Hours (military)
10-5 Minutes
4-0 Seconds
24-25 --- CRC-16 in LoHi format of uncompressed
file. ------- This is important.
26-29 --- Original uncompressed file size
NOTE: Version 1 files are not compressed
so the length can be found with
bytes 16-19, also the header len
for version 1 files is 25 bytes.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 19
Hashing.....
Hashing is simply an arithmetic way of coming up with
an address in a table. You have a set of input numbers
(codes) that will map one-to-one with the output codes in
an ideal situation. That is, each time you input a certain
number, you can rest assured that the output will always
return the same output number. This is not quite the case in
the current versions of .ARC files.The reason is that the
algorithm would require a MEG or so of RAM for
implementation. The utilization of a smaller string table in
all of the ARC programs introduces the possibility of
producing the same output number for 2 or more input
numbers. This is called a hash collision. This is handled
separately in .ARC files with what is called a "collision
table", which helps to locate the correct table entry.
There are three versions of "hashing" used in ARC files.
Hash1 - Key = upper 12 bits of lower 18 bits of unsigned square of
(prefix code + suffix byte) OR 800h
Used in stowage versions 5 and 6
Hash2 - Key = lower 12 bits of unsigned product of
(prefix code + suffix byte) * 15073
Used in stowage version 7
Hash3 - Key = next available address in table.
Used in stowage versions 8 and 9
CRC calculations -
NARC does not use the traditional table lookup method
for calculating the CRC of the file. The table approach
requires the table to be in RAM and takes up more space.
NARC calculates the CRC on the fly, which requires no table
and saves space. The algorithm is taken from the definitive
article by Aram Perez in IEEE Micro, June '83. The
polynomial is X^16 + X^15 + X^2 + X^1 which is not compatible
with the Xmodem CRC.
Gary Conway
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 20
══════════════════════════════
ARC RELEASE DATES AND VERSIONS
══════════════════════════════
These are the various versions of ARC.EXE that I have and
what versions of storage they supported. PKxARC supports all
of these methods as well since they were all originally
created by ARC.EXE.
Date Stowage Methods
Released Version Supported
05-01-85 3.10 Storing,Packing,Squeezing (1-4)
( version 5 in here somewhere)
06-26-85 4.10 Up to version 6 of crunching
11-18-85 4.50 Up to version 6 of crunching
12-04-85 4.52 Up to version 6 of crunching
( version 7 in here somewhere)
01-21-86 5.00 Up to version 8 of crunching
01-31-86 5.10 Up to version 8 of crunching
02-05-86 5.12 Up to version 8 of crunching
10-24-86 5.20 Up to version 8 of crunching
This list is compiled in an attempt to start some kind of
historical record as to what transpired in the ARC world. I
would be interested in hearing of additions.
NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 21
═════════════════════════
NARC.EXE REVISION HISTORY
═════════════════════════
Version 1.0 - First Release 6.10.87 (Beta Release)
Version 1.1 - Released 6.16.87 (Beta Release)
- Fixed bug in tagged extraction to other than default path
- Deletes partial file when extraction fails due to lack of disk space
- Fixed bug dealing with improper end of file condition in HEX view
- Stronger ARChive header checking. Handles special case where two
1Ah bytes are encountered after a bad header
- Fixed bug with PK zero length files
- Introduction of NARCCFG.EXE, replacing NARCLOAD.EXE and NARC.CFG
Version 1.2 - Released 8.10.87 (First "real" release)
- Fixed bug in ARC window when no ARC files. NARC wants F1,F2 or F3
but bug caused a wrong keypress to send "twirping" sound to spkr.
Program seemed to be locked up, but ALT key would continue.
- added PAGE UP, PAGE DOWN, HOME and END functions (great for big ARCs)
- corrected bug when using some of the single letter commands in
extracting (this was a stupid mistake on my part)
- added FIND command for locating sub-files (great for big ARCs)
- added file deletion
- added abort option to tagged extraction when disk full
- added display of number of tagged files and tagged bytes
- added formatted display of tagged file and tagged bytes
- changed write color in Blank routine
- enhanced extract and print abort
- check disk space BEFORE extraction
- added help facility
- 70% FASTER EXTRACTION !
- added code to check for seek past EOF in when bad ARC file has
enormous numbers for file sizes
- fixed ELUSIVE bug in repeated byte expansion when the count byte
was the REP byte
- added Tag All option (F6 key)
- added invert all tags option (F7 key)
- added sound toggle to NARCCFG.EXE (turns sound on and off)
End of NARC.DOC Copyright (c) 1987 Infinity Design Concepts
Page 22